High-Performance Phylogeny Reconstruction Under Maximum Parsimony
نویسندگان
چکیده
The similarity of the molecular matter of the organisms on Earth suggest that they all share a common ancestor. Thus any set of species is related, and this relationship is called a phylogeny. The links (or evolutionary relationships) among a set of organisms (or taxa) form a phylogenetic tree, where modern organisms are placed at the leaves and ancestral organisms occupy internal nodes, with the edges of the tree denoting evolutionary relationships. Scientists are interested in evolutionary trees for the usual reasons of scientific curiosity. However, phylogenetic analysis is not just an academic exercise. Phylogenies are the organizing principle for most biological knowledge. As such, they are a crucial tool in identifying emerging diseases, predicting disease outbreaks, and protecting ecosystems from invasive species [Bader et al., 2001, Cracraft, 2002]. The greatest impact of phylogenetics will be reconstructing the Tree of Life, the evolutionary history of all-known organisms. No one precisely knows the number of organisms that exist in the world. Estimates often cited range from 10-100 million species. Today, only about 1.7 million species are known, and a very small fraction (i.e., on the order of 0.4%) are included in any sort of phylogenetic tree [Yates et al., 2004]. Given the societial benefits of phylogenetic trees, reconstructing the evolutionary history from present-day taxa is a very difficult problem. For n organisms, there are (2n − 5)(2n − 3) · · · (5)(3) distinct binary trees; each a possible hypothesis for the " true " evolutionary history. (There are over 13 billion possible trees for 13 taxa.) Since the size of the tree space increases exponentially with the number of taxa, it impossible to explore all possible hypothesis for them within a reasonable time frame. Most phylogenetic methods limit themselves to exhausitive searches on extremely small datasets (< 50 sequences) or heuristic strategies for larger datasets. Another difficulty lies in accessing the accuracy of the reconstructed tree. Short of traveling back into time, there is no way of determining whether the proposed evolutionary history is 100% correct. The best conclusion is simply the best hypothesis of what might have happened. There is an ongoing search for tree-building methods that are the most robust (i.e., that are more likely to estimate the true topology even when the evolutionary assumptions are violated), consistent (that converge on the true topology as more data are added), and efficient (that converge on the topology most quickly). …
منابع مشابه
An experimental study comparing linguistic phylogenetic reconstruction methods
This paper reports a simulation study comparing and evaluating the performance of different linguistic phylogeny reconstruction methods on model datasets for which the true trees are known. UPGMA performed least well, then (in ascending order) neighbor joining, the method of Gray & Atkinson and finally maximum parsimony. Weighting characters greatly improves the accuracy of maximum parsimony an...
متن کاملShort Quartet Puzzling: A New Quartet-Based Phylogeny Reconstruction Algorithm
Quartet-based phylogeny reconstruction methods, such as Quartet Puzzling, were introduced in the hope that they might be competitive with maximum likelihood methods, without being as computationally intensive. However, despite the numerous quartet-based methods that have been developed, their performance in simulation has been disappointing. In particular, Ranwez and Gascuel, the developers of ...
متن کاملThe Effect of Natural Selection on Phylogeny Reconstruction Algorithms
We study the effect of natural selection on the performance of phylogeny reconstruction algorithms using Avida, a software platform that maintains a population of digital organisms (self-replicating computer programs) that evolve subject to natural selection, mutation, and drift. We compare the performance of neighbor-joining and maximum parsimony algorithms on these Avida populations to the pe...
متن کاملPerformance of Supertree Methods on Various Dataset Decompositions
Many large-scale phylogenetic reconstruction methods attempt to solve hard optimization problems (such as Maximum Parsimony (MP) and Maximum Likelihood (ML)), but they are limited severely by the number of taxa that they can handle in a reasonable time frame. A standard heuristic approach to this problem is the divide-and-conquer strategy: decompose the dataset into smaller subsets, solve the s...
متن کاملDeuterostome phylogeny and the sister group of the chordates: evidence from molecules and morphology.
Complete coding regions of the 18S rRNA gene of an enteropneust hemichordate and an echinoid and ophiuroid echinoderm were obtained and aligned with 18S rRNA gene sequences of all major chordate clades and four outgroups. Gene sequences were analyzed to test morphological character phylogenies and to assess the strength of the signal. Maximum-parsimony analysis of the sequences fails to support...
متن کاملFixed Parameter Tractability of Binary Near-Perfect Phylogenetic Tree Reconstruction
We consider the problem of finding a Steiner minimum tree in a hypercube. Specifically, given n terminal vertices in an m dimensional cube and a parameter q, we compute the Steiner minimum tree in time O(72 + 8nm), under the assumption that the length of the minimum Steiner tree is at most m+ q. This problem has extensive applications in taxonomy and biology. The Steiner tree problem in hypercu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004